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DETAILED ACTION 



1 . This communication is in response to Remarks, filed 1 1/21/2005. 

2. Claims 1-31 are pending. Claims 1, 5, 21, and 25 are currently amended. 

3. Amendments to the specification have been received and entered. 

4. The 35 USC 112 rejection of claims 25-31 has been withdrawn in view of the 
applicants' arguments. 



Response to Arguments 

5. Applicant's arguments with respect to claims 1-31 have been considered but are 
moot in view of the new ground(s) of rejection. 



Claim Rejections - 35 USC § 103 



6. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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7. Claims 1-3, 5-12, 15-18, 20-23, and 25-28 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Charlesworth et al. (6,990,448) in view of Chang et al. 
(6,917,912). 

As to claims 1, 5, 21 and 25, Charlesworth et al. teach: 
identifying attributes including one or more types of accents and one or more 
types of human languages from a multi-party audio information stream (identifying 
attributes from a communication stream, involving more than one speakers, where for 
each language, the speaker's language, accent, dialect and phonetic set are identified, 
col. 9, lines 38-49); 

encoding each identified attribute from the audio information stream into a time 
ordered index, each of the identified attributes sharing a common time reference 
(storing the identified attributes identified in annotation data, within a header, (col. 9, 
lines 43-49), where the header includes a time index which associates the location of 
the blocks of annotation data within the memory, col. 5, lines 52-58); 

comparing results at approximately the same time to generate an integrated time 
ordered index of the identified attributes (identifying the language of the speaker, col. 5, 
lines 62-63, and creating time index associating the location of the blocks that have that 
attribute, col. 5, lines 50-67). 

A computer readable storage medium to store the software engine (a personal 
computer with programmable code stored within it, col. 3, lines 1-5). 
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Charlesworth et al. do not explicitly teach comparing results from different human 
language models. 

However, Chang et al. teach comparing the detecting phonemes against a 
language model, to find the correct language model, (col. 5, lines 19-23). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to combine the methods of Charlesworth et al. with the language 
models et al. of Chang et al. to increase the ability of the system to detect the content of 
the verbal communication, based on a phoneme processing, as taught by Chang et al. 
(col. 5, lines 21-24). 

As to claim 2, Charlesworth et al. teach comparing confidence ratings, (col. 6, 
lines 12-20). 

Charlesworth et al. do not teach the confidence ratings of different human 
languages. However, Charlesworth et al. teach the confidence ratings are based on a 
phoneme representation of the data, where it would be obvious to one of ordinary skill in 
the art at the time of the invention that when different human languages are used to fine 
the correct human language, the weights for the phoneme will be different for each of 
the languages. 

As to claim 3, Charlesworth et al. teach generating a transcript including each 
spoken word, wherein each spoken word shares the common time reference 
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(generating a transcript of the spoken data, (col. 11, lines 25-30) generating the 
annotation data, where the annotation data contains a time index, col. 5, lines 52-58). 

As to claims 6 and 22, Charlesworth et al. teach generating a query on one or 
more of the identified attributes in the time ordered index (generating a query based on 
a attribute, col. 6, lines 24-35). 

As to claims 7 and 1 8, Charlesworth et al. teach correlating a first identified 
attribute of the information stream with a second identified attribute having a similar time 
code (grouping attributes under one memory block with similar time codes, col. 5, lines 
45-57). 

As to claim 8, Charlesworth et al teach the audio information stream comes from 
an unstructured information source (inputting conversational language with video, col. 9, 
lines 38-45). 

As to claim 9, Charlesworth et al. teach the audio information stream includes 
audio-visual data (inputting conversational language with video, col. 9, lines 38-45). 

As to claim 10, Charlesworth et al. teach the audio information stream includes 
speech data (inputting conversational language with video, col. 9, lines 38-45). 
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As to claim 1 1 , Charlesworth et al. teach at lest one of the identified attributes 
further comprises a change of accent (the speakers are identified along with their 
accents, col. 9, lines 38-45. Where it would be necessary that if the speakers had 
different accents a change of accent would be identified). 

As to claim 12, Charlesworth et al. teach at least one of the identified attributes 
further comprises a change of human language (the speakers are identified along with 
their language, col. 9, lines 38-45. Where it would be necessary that if the speakers 
had different languages a change of language would be identified). 

As to claims 15 and 26, Charlesworth et al. teach the time ordered index includes 
a start time and a duration in which each identified attribute was conveyed, (col. 5, lines 
48-57). 

As to claim 16, Charlesworth et al. teach the common time reference comprises 
a time indication (header with time indication, col. 5, lines 38-55). 

As to claim 17, Charlesworth et al. teach the common time reference comprises 
a frame count (header with time information and duration related to a video input, col. 5, 
lines 38-55). 
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As to claim 20, Charlesworth et al. teach the integrated time ordered index 
includes data from different human language models (the time index includes 
information about the used vocabulary and language 5, lines 18-30). 

As to claim 23, Charlesworth et al. teach: 

converting spoken words in an information stream to written text, the information 
stream containing audio information (transcription unit for the inputted audio stream, col. 
1 1 , lines 25-30); 

generating a separate encoded file for every word, wherein each encoded file 
shares a common time reference (each of the words are stored and a time index is 
created for them, col. 5, lines 18-38). 

As to claim 27, Charlesworth et al. teach one or more attribute filters generate 
time ordered index of the audio information stream in real time (the attributes of the 
audio information are generated as the information is inputted into the system, col. 3, 
lines 5-11). 

As to claim 28, Charlesworth et al. teach the audio information stream passes 
through the one or more attributes filers a single time (the audio data is processed as it 
is inputted into the system, and the current data is processed by the system, and then 
the next data is processed, col. 4, lines 33-42). 
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8. Claim 13 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al. in view of Chang et al, as applied to claim 5 above, and further in 
view of Kanevsky et al. (6,665,644). 

As to claim 13, Charlesworth et al. do not teach at least one of the identified 
attributes further comprises a discrete spoken word. 

However, Kanevsky et al. teach identifying discrete words within the input (col. 4, 
lines 40-50). 

Therefore, it would have been obvious to one of ordinary skill in the art, to 
combine the teachings of Charlesworth et al. with the discrete word recognition of 
Kanevsky et al. to properly identify the user's dialect, as taught by Kanevsky et al. (col. 
4, lines 32-35). 

9. Claim 14 rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al. in view of Chang et al., as applied to claim 5 above, and further in 
view of Lucas (VoiceXML for Web-based distributed conversational applications). 

As to claim 14, Charlesworth et al. do not teach the identified attributes are 
encoded via extensible markup language. 

However, Lucas teaches encoding audio via XML (pages 1-2). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the teachings of Charlesworth et al. with the methods of 
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Lucas to bring the power of Web development and content delivery to voice-response 
applications, as taught by Lucas (page 1). 

Allowable Subject Matter 

10. Claims 4, 19, 24 and 29-31 are objected to as being dependent upon a rejected 
base claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

1 1 . The following is a statement of reasons for the indication of allowable subject 
matter: 

As to claim 4, Charlesworth et al. (the closest prior art of record) do not teach nor 
fairly suggest in combination with claim 1 triggering an even to occur up on an 
identification of unique voice characteristics of a speaker in less than five seconds. 

As to claim 19, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 18, the similar time code comprises the first identified attribute possessing a 
start time approximately the same as the second identified attribute or an overlapping of 
the durations associated with the first identified and the second identified attribute. 
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As to claim 24, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 23 generating a link to relevant material based upon the spoken words and 
synchronizing a display of the link in less than five seconds from analyzing the 
information stream. 

As to claim 29, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 25 a manipulation module to perform operation on a first set of attributes in 
order to manipulate a second set of attributes. 

As to claim 31 , Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 25 a triggering and synchronization module to dynamically trigger a link and 
synchronize the appearance of the link based upon a transcribed text from the 
information stream. 

Claim 30 would be allowable since it depends from claim 29, which has been 
indicated to obtain allowable subject matter. 

Conclusion 

12. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. See PTO-892. 
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Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Thomas E. Shortledge whose telephone number is 
(571 )272-7612. The examiner can normally be reached on M-F 8:00 - 4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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